Skip to content

Fix ICE when data section exceeds 2^12 words#7643

Open
Dnreikronos wants to merge 9 commits into
FuelLabs:masterfrom
Dnreikronos:fix/large-data-section-ice
Open

Fix ICE when data section exceeds 2^12 words#7643
Dnreikronos wants to merge 9 commits into
FuelLabs:masterfrom
Dnreikronos:fix/large-data-section-ice

Conversation

@Dnreikronos
Copy link
Copy Markdown
Contributor

@Dnreikronos Dnreikronos commented May 27, 2026

Description

Closes #7612.

realize_load() panicked whenever a data section offset didn't fit the 12-bit immediate on LW/LB (4095 words, ~32KB). Any program whose data section grew past that limit hit an ICE during codegen.

Copy-type loads now pick their encoding by offset size: a single LW/LB when the offset fits 12 bits (unchanged), otherwise MOVI + ADD $ds + LW/LB, with a clear panic once the offset can't fit the 18-bit MOVI immediate (~256KB). Non-copy loads pre-insert their pointer into the data section in a separate pass keyed by load site, so the section stays immutable while bytecode is emitted and the inner pointer load can be one or three instructions. Size estimation in op_size_in_bytes and instruction_size_not_far_jump was updated to count both forms, keeping jump offsets and the data-section offset aligned.

The new layout shifts bytecode size and gas for a few programs, so the dbg, vec, and const_of_contract_call snapshots were refreshed. Added a large_data_section regression test that builds 4200+ distinct u64 entries to exercise the wide-offset path.

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

LW/LB instructions have 12-bit immediate offsets (max 4095). When
copy-type loads exceeded this, realize_load() panicked. Replace
the panic with a three-tier approach: <=12-bit uses single LW/LB,
>12-bit uses MOVI+ADD+LW/LB (3 instructions), >18-bit panics with
a clear message. Update op_size_in_bytes and
instruction_size_not_far_jump to return matching instruction counts
so jump offsets remain correct.
Exercises the >12-bit offset path by creating 4200+ distinct u64
data section entries (values >262143 to avoid MOVI inlining).
Verifies correct codegen via checksum of all loaded values.
@Dnreikronos Dnreikronos requested a review from a team as a code owner May 27, 2026 12:09
@fuel-cla-bot
Copy link
Copy Markdown

fuel-cla-bot Bot commented May 27, 2026

Thanks for the contribution! Before we can merge this, we need @Dnreikronos to sign the Fuel Labs Contributor License Agreement.

@cursor
Copy link
Copy Markdown

cursor Bot commented May 27, 2026

PR Summary

Medium Risk
Touches core Fuel bytecode emission and label offset math; wrong sizing would miscompile jumps or pointers, though e2e coverage was added for the large data path.

Overview
Fixes an ICE when the Fuel data section grows past what a single LW/LB immediate can encode (~4096 words / ~32KB).

realize_load now uses one LW/LB when the offset fits 12 bits; otherwise it emits MOVI + ADD $ds + LW/LB, with an assert if the offset exceeds the 18-bit MOVI limit (~256KB). Non-copy loads no longer dedupe pointers by raw offset: DataSection keys pointers by (source_data_id, load_site_offset) and pre-inserts them in to_bytecode_mut before emission so configurables addresses stay stable.

Bytecode sizing (op_size_in_bytes, instruction_size_not_far_jump, worst_pointer_word_offset) was updated so jump/label layout matches the variable instruction counts for large offsets and large pointer slots.

Adds a large_data_section e2e test (4200+ distinct constants) and refreshes a few gas/size snapshots; minor parser loop cleanup in consume_while_line_equals.

Reviewed by Cursor Bugbot for commit 4eff7df. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread sway-core/src/asm_generation/finalized_asm.rs Outdated
Comment thread sway-core/src/asm_generation/finalized_asm.rs Outdated
The hardcoded -4 byte $pc correction assumed the inner copy-type load
always emits 1 instruction. When the pointer entry's word offset exceeds
12 bits (data section >32KB), the inner load emits 3 instructions
(MOVI+ADD+LW/LB), making the ADD $pc land 8 bytes further than the
stored pointer value accounts for.

Three fixes:
- Dynamic $pc correction based on predicted pointer entry position
- Pointer lookup keyed by (source DataId, instruction offset) instead of
  pointer value, preventing collisions when the same non-copy entry is
  loaded at multiple sites
- Worst-case pointer word offset for size estimates, using
  non_configurables_size_in_bytes instead of total section size to avoid
  configurable entries inflating the threshold
Comment thread sway-core/src/asm_generation/finalized_asm.rs
Comment thread sway-core/src/asm_generation/finalized_asm.rs Outdated
@Dnreikronos
Copy link
Copy Markdown
Contributor Author

bugbot run

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 6888acd. Configure here.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Jun 4, 2026

Merging this PR will not alter performance

✅ 25 untouched benchmarks


Comparing Dnreikronos:fix/large-data-section-ice (4eff7df) with master (ac933ff)

Open in CodSpeed

@Dnreikronos
Copy link
Copy Markdown
Contributor Author

@ironcev ready when you get a chance. This handles data-section offsets that don't fit a 12-bit LW/LB immediate by falling back to MOVI + ADD $ds + load, and fixes the sizing math so jump/label layout matches the variable instruction count. The part worth your eyes is the pointer dedup change in DataSection: it keys by (source_data_id, load_site_offset) so configurable addresses stay stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Many assert_eq calls make compiler panic with too big data section

1 participant